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Abstract 

In this paper we introduce a new class of diffeomorphic smoothers based on 
general spline smoothing techniques and on the use of some tools that have been 
recently developed in the context of image warping to compute smooth diffeomor- 
phisms. This diffeomorphic spline is defined as the solution of an ordinary differen- 
tial equation governed by an appropriate time-dependent vector field. This solution 
has a closed form expression which can be computed using classical unconstrained 
spline smoothing techniques. This method does not require the use of quadratic or 
linear programming under inequality constraints and has therefore a low computa- 
tional cost. In a one dimensional setting incorporating diffeomorphic constraints is 
equivalent to impose monotonicity. Thus, as an illustration, it is shown that such a 
monotone spline can be used to monotonize any unconstrained estimator of a regres- 
sion function, and that this monotone smoother inherits the convergence properties 
of the unconstrained estimator. Some numerical experiments are proposed to illus- 
trate its finite sample performances, and to compare them with another monotone 
estimator. We also provide a two-dimensional application on the computation of 
diffeomorphisms for landmark and image matching. 
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1 Introduction 

Spline smoothing is widely used in many different areas to study the relationship between 
a response variable Y and an independent variable X, see e.g. Wahba |12] for a detailed 
presentation, and have many applications in approximation problems, see e.g. Duchon 
[T5] . Quak & Schumaker [33], de Boor & Schumaker [12] or Lopez de Silanez & Apprato 
[26] . In many fields of interest, including physical and medical sciences, one is often inter- 
ested in imposing a monotonic relationship between two such variables. Typical examples 



include the analysis of dose-response curves in pharmakinetics (Kelly & Rice |l2l]), growth 
curves in biology and many specific practical problems discussed in the literature cited 
below. Note that without loss of generality, monotone smoothing is considered in this 
paper as the problem of computing an increasing function. For calculating a decreasing 
smoother one can simply reverse the "X axis" and then apply the same methodology. 

Unconstrained spline smoothing consists in minimizing over an appropriate functional 
space T a criterion that represents a balance between two terms : fidelity to the data and 
smoothness of the fitting spline. If JF is a Reproducing Kernel Hilbert Spaces (RKHS), 
then it is well known (see e.g. Wahba [42|) that a closed form solution can be computed 
by solving a simple linear system of equations. The simplest idea that comes to mind 
to incorporate monotonicity constraints is to restrict the search space ^ to a subset of 
monotone functions, and then to take as a monotone smoother the function which mini- 
mizes the same criterion over this restricted space. Existence, approximation properties 
and the numerical computation of such monotone smoothers have been widely studied, 
see e.g. Utreras [10], Andersson & Elfving pQ, Elfving & Andersson [IB], Irvine, Marin & 
Smith [23], Beatson & Ziegler [7]. 

However, finding the exact solution of a smoothing problem in a constrained space 
is generally a difficult task and most existing algorithms only produce approximate solu- 
tions. As a closed form solution for such monotone smoothers does not exist in general, 
their numerical computation is generally done by determining the fitted values of the 
smoothing spline on a finite set of points (usually the observed covariates) and uses a set 
of inequality constraints to impose restrictions on the value of the fitted function at these 
points. However, the algorithms used to compute these estimators can be computation- 
ally intensive since they involve a large set of inequality constraints (see e.g. Schwetlick 
& Kunert [33], He & Shi [22], Turlach [35] and the discussion therein). 

In this paper, we introduce a new class of monotone smoothers that have a closed 
form expression which depends on the underlying RKHS, and which can be computed 
using classical unconstrained spline smoothing techniques. Thus, unlike some monotone 
smoothers, this method does not require the use of quadratic programming under linear 
constraints and has therefore a low computational cost. Our approach is based on tools 
that have been recently developed in the context of image warping for the construction 
of diffeomorphisms in two or three dimensions (see e.g. Trouve ^38], Miller, Trouve & 
Younes [30], Glaunes [19], Younes [13], [H], see also Apprato & Gout [2] for the use of 
diffeomorphism in spline approximation). Trouve, Younes and their collaborators have 
proposed to compute smooth diffeomorphisms as the solutions of a new class of ordinary 
differential equation governed by a time-dependent velocity field. In a one-dimensional 
(ID) setting, it is easy to see that a diffeomorphism is a smooth and monotone function, 
and thus the main idea of this paper is to adapt such tools for the construction of ID 
monotone smoothers. Our approach also yields a new method to compute smooth diffeo- 
morphisms for the alignment of landmarks in a 2D or 3D setting. Some examples in the 
2D case for image warping (see e.g. Bigot et al. [9]) are given in the section on numerical 
experiments, but for simplicity our theoretical results are presented in a ID setting. 

Our main contributions are the following: first we show how one can generate a strictly 
monotone function as the solution of an ordinary differential equation (ODE). We also 
prove that for some functional classes, any monotone function can be represented as the 
solution of such an ODE. Secondly, a new criterion to fit a monotone spline is proposed 
and we show that the minimizer of this criterion has a simple closed form expression. As 
an illustrative example, we explain how the overall methodology can be applied to the 
problem of monotonizing any estimator of a regression function. Indeed, in statistics. 



a possible smoothing method under shape constraints consists in first using an uncon- 
strained estimator (such as a sphne, wavelets or kernel smoother) and then projecting 
the resulting curve estimate onto a constrained subspace of regression functions which is 
usually a convex set (see e.g. Mammen, Marron, Turlach & Wand [2E] and Mammen & 
Thomas- Agnan [29]). For the problem of monotone regression, this approach is generally 
referred to as smooth and then monotonize. However, as pointed out by Gijbels [T8] many 
of these monotone estimates appear less smooth than the unconstrained estimates due to 
the projection step. Moreover, it is not clear how one can compute numerically the pro- 
jection of a curve estimate onto a constrained subspace for any unconstrained estimator. 
Our monotone estimator does not suffer from these two drawbacks since it can be easily 
computed, and it yields surprisingly very smooth estimates. 

The remainder of the paper is structured as follows: section [2] gives a brief overview 
of RKHS and the general spline smoothing problem. In section [3|, we show how one 
can generate a strictly monotone function as the solution of an ODE. In section HJ we 
propose a new class of monotone smoothers that we call homeomorphic smoothing splines. 
In Section [5l we apply this methodology to nonparametric regression under monotonicity 
constraints. Section [6]presents a short Monte Carlo study on the efficiency of this approach 
and a comparison with another constrained estimate. Finally, another application of 
homeomorphic splines is presented in a 2D setting for matching problems involving the 
alignment of landmarks. The Appendix provides the proofs of the main results. 



2 The general spline smoothing problem in the ID 

case 

Let 1-Lk a be RKHS of functions in L^(]R) with positive definite kernel K, meaning that 
for all a: G M there exists an element Kx{-) such that for all g in Ti^, we have g{x) = 
{Kx{.),g)K where {■,-)k denotes the scalar product on TLk whose derived norm is 
(for more details on RKHS we refer to Atteia [5], Atteia & Caches [S], Aronszajn [1], 
Duchon [15], Wahba [12], Berlinet & Thomas- Agnan [S]). Let ipi, ■ ■ ■ yi^M^^ functions (not 
necessarily in L^(]R)) such that for any set of distinct points xi, . . . , a;„ in M, the matrix 
T with elements Tjj = i/jj^Xj) has full rank M < n. Let H = Span{'ipj}j=i^___^M + 'Hk 
and assume that we have n distinct pairs of points {xi,yi) G M^. Then the general spline 
smoothing problem is to find the minimizer 

1 " _ 

hn,\ = argmin - y^{h{xi) - yif + A||/i||^, (2.1) 
hen n ^ 

for any h ETi of the form h{x) = YljLi '^iV'i(^) + h{x), where h G Ti-x- It is well known 
that the solution of this smoothing problem is unique and of the form: Vx G M, hn,\{x) = 
^j=iC(ji^j{x) + Yll=i where a,P are solutions of a simple linear system of 

equations (see e.g. Wahba [i2]). 

Throughout this paper, we will assume M = 2 with tpi{x) = 1 and ip2{x) = x, these 
conditions are convenient for satisfying the uniform Lipschitz condition stated in Lemma 
17.11 (see the Appendix). 

As an example of a RKHS, we will often use the Sobolev space of order m G N* 
(see Berlinet and Thomas- Agnan [8J) Tix = 7i"^{M.) endowed with the norm ||/i||f^m = 
\h{x)\'^dx + \h'^"^\x)\'^dx. With this choice for the ipj^s, any function h Eli. is of 



the form h{x) = ai + + h{x) where ai, 02 € M and h G H-k- Hence, one can define a 
norm in Ti by setting 

\\h\\^ = max(|ai|, jasl) + 

Note that 7^ is a Banach space for this norm. From now, to simphfy the notations we 
will omit the superscript H and write \\h\\ff^ = \\h\\. 



3 Differential Equation to generate monotone func- 
tions 

The smoothing spline hn^\ defined previously is not necessarily a monotone function. We 
thus propose to use a connection between monotone functions and time-dependent vector 
fields to incorporate monotonicity constraints into the computation of /in,A- 



3.1 Generating monotone functions 

Let us explain the basic ideas (as described e.g. in Younes [33]) to generate smooth diffeo- 
morphisms. Take any v G C^(M, M) with ||'y'||oo < +00. Then if e > is chosen sufficiently 
small the perturbation (p = Id + ev of the identity function, is a strictly increasing small 
diffeomorphism. Now, ii vt^, . . . , Vt^ are continuously differentiable functions on R and if 
e > is sufficiently small such that Id + evt,. are small diffeomorphisms on R, then we can 
construct the following sequence of diffeomorphisms (ptp = {Id + eftp_J o . . . o {Id + evt^). 
Then, note that = {Id + evt^,) o 0*^ = 0*^ + evt^ o which can also be written as 



,X - 0tAX 



VxGR ^^''^^^ ' ^ ^'"^ =VtA<l>tM- (3.1) 

As e — s> and limp^+00 tp+i — tp = , (13. ip looks like a discretized version of an inho- 
mogeneous differential equation of the following form (by introducing a continuous time 
variable t): 

^ = M^t). (3.2) 

In the sequel, v will be a function of two variables {t, x), and for a fixed t we will use 
the notation x ^ Vt{x) to refer to the application x ^ v{t,x). The variable t varies in 
the finite time interval [0; 1] while x belongs to R. Similarly depends both on the time 
t and the variable x, and (f)t{x) will refer to 0(t, x). Thus, equation (13.21) is equivalent to: 
VxGM,^^ = t;(t,0(t,x)). 

As we will see, under mild conditions on the time-dependent vector field (fj)jg[o,i], 
the solution of the above ODE is a diffeomorphism at all time t and thus a monotone 
function. The main idea of this paper is thus the following: we transfer the problem of 
computing a monotone spline from a set of n data points (a;^, ?/j)j=i...„ to the problem of 
computing an appropriate vector field (f")t6[o,i] which depends on these data points. A 
monotone smoother /„ is then defined as the solution at time t = 1 of the ODE (13. 2p 
governed by the vector field (f")tg[o,i], i.e fn{x) = 0i(x) with 0o(x) = x. The main 
advantage of this approach is that the computation of (f")jg[o,i] will be obtained from 
an unconstrained smoothing problem, and therefore the calculation of only requires to 
run an ODE without imposing any specific constraints. This yields a fitting function 
which is guaranteed to be monotone. 



3.2 Vector fields and ODE 

Following the notations in Younes ^4], let us state several definitions. 



Definition 3.1 {X^,X^ and X) is the space of time- dependent vector fields {vt G 
T-C,t G [0,1]) such that \\v\\x'^ =def Jq W^tWdt < +oo. X^ is the space of time- dependent 
vector fields (vf EHji E [0,1]) such that =def Jq WvtW^dt < +oo. Finally, X is the 

set of all time- dependent vector field {vt E7i,t E [0, 1]). 



The definitions of X, X^, X^ are classical in the context of time-dependent PDE which 
are formulated as Banach space- valued functions, see e.g. Renardy & Rogers [31]. Note 
that by the Cauchy-Schwarz inequality, ||f ||^i < ||f Ha-z, and thus X'^ C X^ C X. For 
V E X^, we formally define an ODE governed by the time-dependent vector field {vt,t E 
[0,1]) as 

^ = Vt{<Pt). (3.3) 

Definition 3.2 Let Q = [0,1]. A function t \—>- (pt is called a solution of the equation 
lis. 3\) with initial condition the identity if for all x E Vt, t ^ 4't{x) is a continuous func- 
tion from [0, 1] to M., (f)o{x) = x for all x E Q, and for all t E [0, 1] and all x E Vl, 
(f)t{x) =x + J^Vsi(i)six))ds. 



The following theorem, whose proof is deferred to the appendix, shows that the solu- 
tion of the equation (13.31) is unique and is a homeomorphism for all time t E [0, 1]. 



Theorem 3.1 Assume that the kernel K is bounded on M^, and that there exists a con- 
stant Ci such that for any h E Tix 

\h{x) - hiy)\ < Ci\\h\\K\x - y\. (3.4) 

Let V E X^ . Then, for all x E Q and t E [0, 1], there exists a unique solution of with 
initial condition the identity. Besides, for all t E [0, 1], (f)t is a homeomorphism from VL 
to M^)- 



The above uniformly Lipschitz assumption (13.41) for the kernel K is not restrictive as 
it is satisfied in many cases of interest. Indeed, observe that for for any h E Tix'- 



\hix)-hiy)\ = \{Ki.,x)-Ki.,y),h)n,\ < ||/i|k||ir(., x) - K(., 



K- 



If is a radial kernel of the form K{x,y) = k{\x — y\) for some function A; : R — M, 
then the above equation implies that \h{x) — h{y)\ < 2||/i||j^|A;(0) — k{\x — y\)\. Hence, 
h satisfies equation (13.41) provided k is uniformly Lipschitz on R. This is the case for a 
Gaussian kernel: k(\x — y\) = e~'^~^' , and also in the Sobolev case where K is given 

by (see e.g. [8]): Kix, y) = k^{\x - y\) := EZo' litp"(t"it^!^^ " 



Remark: this framework can be extended to a 2D setting for generating diffeomorphism 
of M^. For this, let 7^2 denote a set of smooth functions from to (see Section 16.21 
for an example) and define X = {{vt,t G [0, 1]) with Vt G 7i2 for all t G [0, 1]}. Let Let 
Q be an open subset of and define an ODE governed by the time-dependent vector 
field {vt, t G [0,1]) E X as: ^ = ft(0J'), with Vo{x) = x. Using arguments in the proof of 
Theorem 13.11 and assuming that the functions h in 7^2 are sufficiently smooth and satisfy 
a uniform Lipschitz condition of the type (13.41) . then one can easily show that the solution 
of such an ODE is unique and is a diffeomorphism from Q to 0t(fi) C for all time 
tG [0,1]. 

4 Homeomorphic smoothing splines 

4.1 A connection between monotone functions and time-dependent 
vector fields 

A natural question is to ask if any monotone function can be written as the solution 
of an ODE governed by a time-dependent vector field. First, consider the case where 
Hk = ?i™(M) and / belongs to the Sobolev space 

if™([0, 1]) = {/ : [0, 1] ^ M, is absolutely continuous with [ \f^'''\x)\^dx < +oo}. 

Jo 

Then, if / is monotone, one of our main results is the following theorem which states that 
/ can be represented as the solution at time t = 1 of an ODE: 

Theorem 4.1 Assume that H = Span{l,x} + 7^™(M). Let m > 2 and f G if™([0, 1]) 
be such that f'{x) > for all x G [0, 1] and define (t>t{x) = tf{x) + (1 — t)x, for all 
t G [0,1]. Then, there exits a time-dependent vector field (f/)tg[o,i] depending on f, such 
that v{ G 7Y™(M) for all t G [0,1] and which satisfies 0i = 0o + Jq v{ {(f)t)dt, and thus 
f{x) = 01 (x) = X + Jq v{ {(f)t{x))dt. Moreover for all t G [0, 1] one has that 

v{{M^)) = v{{tf{x) + (1 - t)x) = fix) -x = ^{x) for all x G [0, 1]. (4.1) 

For all t G [0, 1], the function v{ can be chosen as the unique element of minimal norm 
in H"^{R) which satisfies equation ([^.i[ ). 

To the best of our knowledge, this representation of a monotone function by such 
an ODE has not been used before. The formulation (14. ip suggests the following trick 
to compute a monotone smoother from a set of n data points {xi,yi) G [0,1] x M: if 
one considers yi as an approximation of /(xj) for some function /, then to obtain a good 
approximation of , one can use the 's to compute a vector field v"' that satisfies roughly 
the interpolating conditions (14.11) at the design points. More precisely, at any time t, the 
vector field is obtained by smoothing the "data" {tyi + (1 — t)xi, yi — Xi), i = 1, . . . ,n. 
Finally, to compute a monotone smoother we just have to run the ODE (13.31) with the 
vector field f". 

When Tix 7i™(M), it is not clear if one can obtain a general correspondence between 
monotone functions and their representation via a vector field v G X"^. However, we believe 
that the proof of Theorem 14.11 could be adapted to other RKHS. 



4.2 A new monotone smoothing spline 



Let (xj, yi),i = 1, . . . , n be a set of data points with Xi G [0, 1] and ?/j e M. A new smooth- 
ing sphne problem under monotonicity constraints can be formulated in the following way: 
for a time-dependent vector field v E X, define the "energy" 

E\{v)= -y^{yi-x^-vt{ty^ + {l-t)xi)fdt + X \\ht\\\dt, (4.2) 
Jo n .^^ Jq 

where Vt{x) = a\ + a\x + ht{x), and A > is a regularization parameter. Then, 
take f"''^ = argmin^g^t' i?A('w)) and a monotone smoother f^^^ is obtained by taking 
fn,\i^) — (f'T'^i^) = x + f "''*'(0j"'^(x))(it. The following proposition gives sufficient 
conditions for the existence of f"''^: 



Proposition 4.1 Assume that the conditions of Theorern VJ . 1\ are satisfied. Assume that 
n > 2 and that the kernel K : M."^ ^ M. is continuous. Suppose that the Xi 's and the yi 's are 
such that the n "design points" tyi + (1 — t)xi are distinct in R for any t G [0, 1]. Then, 
the optimization problem ( [^.^ has a unique solution f"''*' G X such that at each time 
t G [0, 1], Vt''^ is the solution of the following standard unconstrained smoothing problem: 
find Vt eH which minimizes 

1 

^A(^^t) = - (2/^ ~ ~ + - ^)^*))' + ^11^*11^' (4-3) 

i=l 

where Vt{x) = a\ + a\x + ht{x). Moreover v^'^ G X"^ , and f^^^isa monotone function on 
[0,1]. 



Let us remark that if one defines T-C^ as the subspace of functions / G 7i such that / 
is a strictly monotone function on [0, 1], then a monotone smoother f^^ can de defined 
by mininizing the classical spline smoothing criterion over the restricted space T-C^ i.e. 

/„^,, = argmin i ^(/.(x.) - y,)^ + A||/i|||. (4.4) 

General smoothing splines problems under under shape constraints such as monotonicity 
have been studied in detail in Utreras [ID]. Theorems proving the existence, uniqueness 
and general results concerning the characterization of are given in Utreras [ID] , to- 
gether with a study of the convergence rate of in a nonparametric regression setting. 
Hence, it would be interesting to study the relationship that may exist between the esti- 
mators and y^. However, we believe that this problem is not an easy task which is 
beyond the scope of this paper. 

4.3 Computational aspects and the choice of A„ 

Numerical computation The optimization problem (14.31) amounts to solve, at each 
time t G [0, 1], a simple finite-dimensional least-square problem which yields a very simple 
algorithm to compute a smooth increasing function: choose a discretization t^ = j;, k = 
0, . . . , T — 1 of the time-interval [0, 1] (we took T = 30) and set (pn = x ioi x E [0, 1]. 



Then repeat for A; = 0, . . . , T — 1: find the solution t;"^''^ of the unconstrained smoothing 

problem (14.31) for each t = tk, and then compute (p^Vix) = xi^) ~^ T'^u'^ i'Pnxi^))- '^^^ 
proposed numerical scheme is based on 

^iX = iId + ^v-^'){^l,) (4.5) 



which replaces the theoretical relation (pl^'^^ = (fi^^ + jl^^^ v^''^{(j)'^ ^du. Remark that 



equation (14.51) shows that if < 1 for all t, then 4>^"\ remains monotone provided 0^* 



is monotone. This condition is not really restrictive since we have shown that (f"''*')tG[o,i] 
is in and that t f"''*' is a continuous map on [0; 1]. Thus, our estimator based on 
the Euler scheme (14.51) remains monotone if T is chosen sufficiently large, namely greater 
than supjg[o,i] ll^r'^ll- 

Another important question is the error made using the Euler discretization scheme 
instead of the correct ODE This point is left open since it is far beyond the scope of this 
paper but the use of the Gronwall Lemma should enable to derive upper bound between 
the theoretical 0^;^ and the approximated one derived from (14.51) . 

Choice of the regularization parameter A fundamental issue is the choice of the 
regularization parameter A. In our simulations, we have obtained good results via an em- 
pirical choice of A inspired by the generalized cross-validation (GCV) criterion of Craven 
& Wahba [11], see also Girard [20] for fast cross-validation methods. For i = 1, . . . ,n 
and t G [0, 1], define X* = tyi + (1 — t)xi and Yi = yi — Xj. Then, note that at each 
time t G [0, 1] the smoothing spline f"''*' evaluated at the "design points" X*, . . . , X* is 
a linear function of the observations Yi, . . . , Yn, i.e. there exists a matrix Ax^t such that 

^n,A _ (^v]^'^(xl), . . . , w^'^l^*))' = Ax,tY, with Y = (fi, . . . , Yn)'. Therefore, to choose 
the smoothing parameter A, we simply propose to minimize the following empirical GCV- 
type criterion : 

v(x) = nT.i=iiyi - fuA^i)) g^ 

J^[TriIn-Ax,tWdt ■ 

In the above equation, the quantity J^[Tr{In — Ax,t)Ydt can be interpreted as a measure 
of the degree of freedom of the smoothing spline The quantity ^(A) is therefore the 
classical GCV criteria which is the ratio between the empirical error and the complexity 
of a smoothing procedure. To set a good penalization parameter A, we simply use a grid 
search to minimize ^(A). 



Computational cost The proposed method has a relatively low computational cost 
compared to classical constrained optimization methods. If to, ... ,1^-1 denotes a dis- 
cretization of [0, 1] (with T ndependent of n), our method requires for each tj. the inver- 
sion of a symmetric definite matrix of size n x n which is possible using 0{n^) operations 
with a Cholesky algorithm for instance. The computational cost of our method is thus 
0{Tn^). Numerical computation of a constrained spline smoothing problem such as (14. 4p 
is generally done by using a set of inequality constraints to impose monotonicity on the 
value of the fitted function at a finite number of points. However, such algorithms can be 
computationally intensive since solving a general problem of quadratic optimization with 
linear constraints is generally NP-hard (see e.g. Pardalos & Vavasis [32]). Primal-dual 
methods for instance can iteratively solve the problem but their complexity is larger than 
C(n3). 



5 A non-parametric regression problem under mono- 
tonicity constraints 



Consider the standard nonparametric regression problem on a bounded interval: 

Vi = fix,) + ei, z = 1, ... ,71, (5.1) 

where / : [0, 1] ^ M and are independent and identically distributed (i.i.d.) variables 
with zero mean and variance cr^. The regression function / is assumed to belong to a class 
of strictly increasing functions that satisfy some smoothness conditions to be defined 
later. Smoothing procedures for monotone regression can be found in He & Shi [22], 
Kelly & Rice [24], Mammen [27|, Mammen & Thomas- Agnan [29], Hall & Huang pT] . 
Mammen, Marron, Turlach & Wand [28], Dette, Neumeyer & Pilz [T3| and Antoniadis, 
Bigot & Gijbels [3|. 

In this section, we explain how homeomorphic splines can be used as a smooth and 
then monotonize method. Let /„ be an unconstrained estimator obtained from the data 
{Hi, Xi),i = 1, . . . ,n (e.g. by spline, kernel or wavelet smoothing). Our goal is to construct 
a monotone estimator which inherits the asymptotic properties of the unconstrained 
estimator /„ in terms of the empirical mean squared error: RnQni f) — n X]r=i(/n(^«) ~ 
f[xi)Y. For this, starting from the values fn{xi) instead of the observed yi's, take the 
vector field v^'^ which minimizes the following criterion: 

^;"'^ = argmin / -\^(p{xi)-x,-Vt{tr{x,) + {l-t)x.i)Ydt + \ [ \Mldt, 
Jo J Jo 

where Vt{x) = a\ + a\x + ht{x). Then is defined as the solution at time t = 1 of the 
ODE (13. 3p governed by the time-dependent vector field u"''^. 

5.1 Asymptotic properties of the monotone estimator f'^^^ 

The following theorem shows that under mild conditions on the design and the uncon- 
strained estimator, the monotone estimator ^ inherits the asymptotic properties of /" 
in term of rate of convergence. To the best of our knowledge, this is the first consistency 
result on estimators defined through large diffeomorphism models governed by ODE. 



Theorem 5.1 Assume that the conditions of Theorem \3.1\ are satisfied, and that the 
kernel : — M continuous. Moreover assume that the function f is continuously 
differentiahle on [0, 1] with f'{x) > for all x G [0, 1] and that there exists a time- 
dependent vector field G X"^ such that for all t G [0, 1].' 

v({(j)t{x)) = f{x) — X for all x G [0, 1], 

where 0t(x) = tf{x) + (1 — t)x. Suppose that the unconstrained estimator and the 
points Xi,i = 1, . . . ,n satisfy the following property: 

Al for all t G [0, 1] and all I < i, j < n with i ^ j , 

tr{xi) + (1 - t)xi ^ trixj) + (1 - t)x, a.s. (5.2) 



Then, f"''^ G X^a.s. and thus fn\ is a monotone function on [0, 1]. // we further assume 
that: 

A2 there exists a weight function uo : [0, 1] ^]0, +oo[ such that for any g G (^""^([0, 1], M) 
one has lim„^+oo ^ XlILi di^i) Jq g{x)uj{x)dx, 

A3 Rn{fn, /) — > m probability as n ^ +oo. 

Then, for any sequence X = Xn —* 0, we have that there exists a deterministic constant 
Ai (not depending on n) such that with probability tending to one as n ^ +oo: 

Rn{tMJ)<Al (i?„(/„,/) + A„). 

Equation fl5.2p may not be satisfied for time points tij such that = ^ ^^'■^ J 

J tij Xj Xi 

Since tlie function t — > ^ is injective, tfiis can only fiappen for a finite number of time 
points t G [0, 1]. Hence assumption Al is generally satisfied provided the design points 
are distinct. Moreover, if equation fl5.2p is not satisfied for some points t, one can argue 
that it is possible to modify the estimator /" without changing its asymptotic properties 
(by slightly varying e.g. the smoothing parameter used to compute it) such that (15.21) is 
true for any t G [0, 1] and all 1 < i, J < n. Note that under assumption Al, Proposition 
14.11 implies that v"''^ can be easily implemented using unconstrained spline smoothing. 

The assumption A2 means that in some sense the design points are sampled accord- 
ing to the density u!{x). The assumption A3 is satisfied whenever the expected empirical 
mean squared error Ei?„(/„, /) converges to zero as n —>■ +oo. The fact that f"''*' G X"^ 
guarantees that /„ is a monotone function. Moreover, one can see that if A„ decays as 
fast as the empirical error /?„(/„,/) then the estimator f^x^ same asymptotic 

convergence rate than the unconstrained estimator /„. Similar results for smooth and 
then monotonize approaches are discussed in Mammen & Thomas- Agnan [29] , Mammen, 
Marron, Turlach & Wand [28]. However, the advantages of our approach over existing 
methods are the following: it yields a monotone smoother which has a closed form expres- 
sion and which is guaranteed to be monotone on the whole interval [0, 1]. Moreover, our 
approach is very general as it is not restricted to functions belonging to Sobolev spaces. 

5.2 Optimal rate of convergence for Sobolev spaces 

Let us return to the specific case where / G if™([0, 1]) and Hk = ?^™(M). The asymp- 
totic properties and optimal rates of convergence (in the minimax sense) of unconstrained 
estimators for functions belonging to Sobolev spaces has been extensively studied (see 
e.g. Nussbaum [31], Speckman [36]). The estimator fn,-y of Speckman [36] is based on the 
use of the Demmler-Reinsch spline basis and on a smoothing parameter 7 (see Speckman 
|36j for further details). Speckman [35] has shown that for an appropriate choice 7* then 

KRn{fn,'y* , f) = O ^?7,~2™+i j if / G -ff™'([0, l]) which is known to be the minimax rate of 
convergence for functions belonging to Sobolev balls. This result is based on the assump- 
tion that the design points are such that Xj = G{{2i — l)/2n) where G : [0, 1] — >• [0, 1] 
is a continuously differentiable function with G'{x) > c > for some constant c. Hence, 
the estimator of Speckman [36] satisfies Assumption A2 with uj{x) = qttq^tm), and one 



can check that Assumption Al also holds. The following corollary is thus an immediate 
consequence of Theorem 14.11 and Theorem 15.11 



Corollary 1 Assume that H = Span{l,x} + 7^'"(R). Let m > 2 and / G H"'{[0, 1]) 
be such that f'{x) > for all x G [0, 1]. Then, the monotone estimator fn\„ based on 

the minimax estimator fn^-y* of Speckman 136] is such that Rnifnx^i f) — ^n~2m+i j ^ 

2m 

provided Xn = 0{n 2m+i j. 

To obtain an adaptive choice of A (not depending on the unknown regularity m of /), 
the above Corollary suggests to take = ^ to have a monotone estimator whose empir- 
ical mean squared error decays as fast as Rn{fn,f)- This choice may yield satisfactory 
estimates but in our simulations a data-based choice for A using a GCV criteria (14. 6p gives 
much better results. 

6 Numerical experiments 
6.1 ID case and monotonicity 

Dette, Neumeyer and Pilz [13] have recently proposed another type of smooth and then 
monotonize method which combines density and regression estimation with kernel smoothers. 
This approach has been shown to be very successful on many simulated and real data sets 
(see Dette and Pilz [H]) and we shall therefore use it as a benchmark to assess the quality 
of our monotone estimator. Similarly to our approach, it requires a preliminary uncon- 
strained estimator /„. This estimator is then used to estimate the inverse of the 
regression function. For this Dette, Neumeyer and Pilz [13] propose to use the following 
estimator 

where Ka is a positive kernel function with compact support, hd a bandwidth that con- 
trols the smoothness of and is an integer not necessarily equal to the sample size 
n which controls the numerical precision of the procedure. A monotone estimator m„ is 
then obtained by reflection of the function at the line y = x (see Dette, Neumeyer 
and Pilz [13] for further details). In Dette and Pilz [I3j, it is proposed to use a local 
linear estimate (see Fan and Gijbels [17]) with Epanechnikov kernel for the unconstrained 

/~2\l/5 

estimator /„. The bandwidth hr of this unconstrained estimate is chosen as hj. = i^) , 

where = 2{n-i) X]r=/ (^/(i+i) ~ ^(«))^- ^'^^ choice of the bandwidth hd, it is recom- 
mended to choose hd = h^. However, for a fair comparison with our data-based choice of 
A via GCV, the best choice for hd is chosen by cross-validation via a grid search. 

We investigate the regression model with a regular design i.e. = j^,i = 1, . . . ,n, 
normally distributed errors, sample size n = 50 and a signal to noise ratio (SNR) of 3. The 
signal-to-noise ratio is measured as sd{f{x))/a, where sd(/(x)) is the estimated standard 
deviation of the regression function, /(xj) over the sample i = 1, . . . ,n, and a is the true 
standard deviation of the noise in the data. The monotone regression functions that we 



consider are (see Dette and Pilz [13J) 

, , exp(20(a;- 1/2)) , , I, 1 . ^ 2 

m^{x) = i^,,p(2o(^_i/2)) - "^^^^^ = 2^^^ " + 2' "^^^^^ = ^ " 

These functions correspond to, respectively, a function with a "continuous jump", a 
strictly increasing curve with a plateau, and a convex function. The different functions 
are displayed in Figures [T]l3l 

In Figures [THH we present some curves for the estimates of these three test functions. 
A Gaussian kernel has been used to compute the homeomorphic smoothing splines (using 
other kernels gives similar results). For the choice of the regularization parameter A the 
GCV criterion (14. 6 p is used, and recall that we use cross-validation for the choice of h^. As 
one can see in Figures [T]l3l the homeomorphic smoothing spline based on the local linear 
estimator gives results similar to those obtained via the estimator of Dette, Neumeyer 
and Pilz [13j. However, our approach yields monotone estimator that are visually much 
smoother and very close to the true regression in all cases. Homeomorphic smoothing 
splines also seems to give very nice results even if the unconstrained estimator is very 
oscillating as it is the case for the local linear estimator in Figure [2] and Figure [31 



(a) (b) (c) (d) 

Figure 1: Signal mi: the dotted line is the unknown regression function, (a) noisy data with 
SNR = 3, (b) Local Linear Unconstrained Estimator, (c) Dette et al.'s estimator, (d) Homeo- 
morphic Smoothing Spline based on the Local Linear estimator. 




(a) (b) (c) (d) 



Figure 2: Signal m2: the dotted line is the unknown regression function, (a) noisy data with 
SNR = 3, (b) Local Linear Unconstrained Estimator, (c) Dette et al.'s estimator, (d) Homeo- 
morphic Smoothing Spline based on the Local Linear estimator. 

Table 1: Mean integrated squared error (MISE) over the 100 simulations for each method. 





Signal mi 


Signal m2 


Signal ma 


Homeomorphic smoothing spline 


0.0032 


0.00076 


0.00089 


Dette et al.'s estimator 


0.0035 


0.00098 


0.0014 



To compare these two monotone estimates, we have used 100 simulations runs for 
each regression function. The same unconstrained estimator (a local linear estimate with 



(a) (b) (c) (d) 

Figure 3: Signal ms: the dotted line is the unknown regression function, (a) noisy data with 
SNR = 3, (b) Local Linear Unconstrained Estimator, (c) Dette et al.'s estimator, (d) Homeo- 
morphic Smoothing Spline based on the Local Linear estimator. 



(a) (b) (c) 

Figure 4: Simulated mean squared error with SNR = 3 (computed over 100 simulations runs) 
on signal mi,m2 and m^: Dette et al. 's estimator (dashed curves) and homeomorphic smoothing 
spline (solid curves) for the three regression functions. 




Epanechnikov kernel) is used. For the 100 simulations, we have calculated the pointwise 
mean squared error (MSE) for the two estimates and m„, evaluated on an equidistant 
grid of size 2n. Curves for the MSE of the three estimate are displayed in Figure HI Again, 
these simulations clearly show that our approach compares similarly to the monotone 
estimator of Dette, Neumeyer & Pilz [13j for the signal mi and m2, and outperforms 
Dette et al.'s estimator for the function ma. Table [U shows that it gives better results in 
terms of mean integrated squared error (MISE over [0, 1]) for the three test functions. 



6.2 2D experiments and diffeormorphic matching 

Let (xi, . . . , Xn) and {yi, . . . , yn) be two sets of n landmark in M?. The problem of landmark 
matching is to find a function / : — > such that f{xi) ~ yi for alH = 1, . . . , n (see 
e.g Camion & Younes [10] and references therein). Let 7iK,2 be a RKHS of functions in 
L^(M^) with positive definite kernel K and denote by 7^2 the set of functions / : — > 
given by 

fix) =Ax + b+ ( ^^[^W xeR^, 



h2{x) 

where hi,h2 G 'Hk,2i A is 2 x 2 matrix and h eM?. Landmark matching can be formulated 
as the problem of finding the minimizer 

1 " 

/n,A = argmin - V ||/(xi) - y^i\\lc2 + K\\hi\\l: + ll^2||i') 
f^H, n ^ 

Under mild assumptions on the landmarks, the solution of this matching problem is 
unique and of the form: Vx G M, fn,\{,x) = Ax + 6+1 ^=(n~^ r '^ T<r( '' ^ \ ) ' ^^^^^ 
A, b, Pi, P2 are solutions of a simple linear system of equations (see e.g Camion & Younes 



|10j). However, there are no constraints in this approach which guarantees that fn,x is a 
one-to-one mapping of M^. Indeed, folding are possible for small values of A as shown in 
Figure [5], where the mapping fn^\ is displayed via the deformation of an equally spaced 
grid of points in M^. 




(a) (b) (c) (d) 




(e) (f) (g) 

Fi gure 5: (a) n — 6 landmarks to he aligned xi, . . . ,Xn (circles) and yi, . . . , y„ (stars). Four 
landmarks at the corner of the grid are already at the same location. The hold diagonal lines rep- 
resents two landmarks he to aligned. Landmark matching with unconstrained spline smoothing: 
with a large A (h), a moderate X (c) and a small A (d). Landmark matching with homeomorphic 
spline: with a large A (e), a moderate A (f) and a small A (g). 



6.3 Homeomorphic spline for difFeomorphic matching 

Let X = {{vt,t G [0, 1]) with Vt G 7^2 for all t G [0, 1]}. Diffeomorphic matching of two 
sets of landmarks in by homeomorphic spline is defined as the problem of finding a 
time-dependent vector field v & X, which minimizes the "energy" 

Exiv)= / -y2\\yi-^^-Mty^+i^-t)Xi)\\l,dt + X i\\hi4l + \\h2,tfK)dt, (6.1) 
Jo ^ ._i ^0 



1=1 



hi,t{x) 



with A > a regularization parameter and Vt{x) = Atx + h + [ i / i , where for 

V il'2,t{X) J 

each t G [0,1], /i2,t G 'Hk,2, At is 2 x 2 matrix and bt G M^. Then, by taking 
v"''^ = argmin^^^^ Ex{v), a diffeomorphic mapping between these two sets of landmarks 
is obtained by computing fn,\{x) = (p^^'^i^x) = x + f"''^(0J'"'^(a;))(it. Under mild as- 
sumptions, and arguing as in the proof of Proposition 14.11 the optimization problem 
(16.11) has a unique solution v"^'^ G X such that at each time t G [0,1], v^"^ is the so- 
lution of the following standard unconstrained smoothing problem: find Vt G 7^2 which 
minimizes E{{vt) = \ YTi=\ 11?/* -^i- Vt{tyi + (1 - i)a;i) IIrz + A(]|/ii,t||^ + ||/i2,t||^), where 

(h (x) \ 
, '* W J . Hence, the computation of the diffeomorphic mapping ^ 

is obtained using unconstrained spline smoothing and by running an ODE. Numerically, 
we use an Euler scheme similar to the one proposed in Section H73l 



Remark that the formulation (16.11) is somewhat similar to the geodesic smoothing 
spline problem proposed by Camion & Younes [10] in 2D setting. To compute a smooth 
diffeomorphism to align two sets of landmarks {xi,yi),i = 1, . . . , n in x M^, Camion & 
Younes (TU] suggest to minimize the following energy 

E 11^ - vMml^ + ^j\\\hiAK + IMDdt (6.2) 

over all time-dependent vector fields and all landmark trajectories qi{t),i = 1, . . . ,n with 
initial conditions ^^(O) = Xi and qi{l) = Hi- This leads to an optimization problem which 
can be solved by a gradient-descent algorithm. In our formulation (16.11) . the landmarks 
trajectories are fixed, and correspond to linear paths qi(t) = tyi -|- (1 — t)xi between Xi 
and yi. This makes the optimization problem (16.20 easier to solve. 

An example of diffeomorphic mapping is shown in Figure and one can see that even 
for small values of A the mapping remains one-to-one contrary to unconstrained spline 
smoothing. An example of landmark-based image warping is also displayed in Figure 
[6] which illustrates the advantages of homeospline over unconstrained spline smoothing 
which may lead to unrealistic matching. 




Figure 6: Image warping based on landmark alignment. Figures (a) and (h) are two images with 
manually annotated landmarks representing facial structures. Figure (c) is the warping of image 
(h) onto image (a) using landmark alignment with unconstrained spline smoothing. Figure (d) is 
the warping of image (b) onto image (a) using landmark alignment with homeomorphic spline. 
Images are taken from the IMM Face Database \37^ . 



7 Conclusion and Future works 

Homeomorphic splines allows one to compute with a low computational cost, monotone 
regressors in a ID setting. In the presence of noisy data, this leads to an estimator which 
performs at least as well as existing ones. Moreover such an estimator has an optimal 
rate of convergence over a large class of functional spaces. Homeomorphic splines can also 
be extended in a 2D setting for landmarks and image warping. In future work, we plan 
to investigate applications for 2D regression under monotonicity constraints, and also to 
study the asymptotic normality of our estimator. 
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Appendix 



Proof of Theorem I3.lt the key point to derive this resuh is the use of the foUow- 
ing lemma which follows immediately from condition (13. 4p and the assumption M = 2, 
ipii^x) = 1, ■i/j2{x) = X made in Section [2j 

Lemma 7.1 Suppose that the assumptions of Theorem \3.1\ are satisfied. Then, for any 
g & Ti, there exists a constant C2 (not depending on g) such that for all x,y G M, 
Igix) - g{y)\ < C2\\g\\\x - y\, and \g{x)\ < CsHfi-IKl + 

Existence and uniqueness of the solution of the ODE : It follows from the Picard- 
Lindelof Theorem (see Theorem 19 and 20 in Younes [44J using the same notations). 
Invertibility of the solution : for v G X^, x G and any t,s E [0,1], we denote 
by 0si(a;) the value at time t of the unique solution of the equation ^ = Vr{4>r) which 
is equal to x at time s. The solution ^^^(x) is equal to x = ^^^(x) at time r G [0,1], 
and thus also equal to 4>rt{^) = 4>^ti^) which implies that 4>'^t{x) = o 0g^(x) and thus 
° — ^ which proves that for all t G [0, 1] 0f = (pot is invertible, and that its 

inverse is given hj (j)^^ = (pto- 

Continuity of (pt and its inverse: (pt is a homeomorphism using Gronwall's lemma 
and the arguments in Younes [S] . This ends the proof of Theorem 13. 1[ □ 

Proof of Theorem mi} let t G [0, 1] and define the interval It = [t/(0), t/(l) + (l-t)] = 
[at, bt]. Given our assumptions on / the function 0t(x) = tf{x) + (1 — t)x is a continuously 
differentiable homeomorphism from [0, 1] to It such that its inverse is continuous and differ- 

entiable with ^4^('y) = „,,_i-\, — r for all y E L, which shows that ^^x— is continuous 

dy VW/ tf'{(f)t '-{y))+{l-t) ^ dy 

since we have assumed that /'(x) > for all x G [0, 1]. Hence, (pt is a diffeomorphism from 
[0, 1] to It- Moreover, the above formula for ^^^^ and the fact that / G H"^([0, 1]) implies 
that belongs to H"^{It) by application of the chain rule of differentiation. Let ft the 
function defined on It such that ft{y) = f{(pt^{y)) " 4't^{y) for V ^ Given that (pt 
is a diffeomorphism from [0, 1] to It and that (pt^ G H"^{It), we can again apply the chain 
rule of differentiation to show that ft belongs to II"^{It). Now, define the sub-space Hq of 
functions in H"^(R) which coincide with ft on It = [at, bt]. First, we shall prove that Hq is 
not empty. Indeed, choose q < at and dt > bt and define the Hermite polynomials Pq and 



Pi of degree 2m + 1 such that for all < A; < m - 1, '{at) = fi '{at) and '{ct) = 0, 
and P^\bt) = ft^\bt) and P^\dt) = 0. Then, define the function on M such that 



By construction of Pq and Pi and the fact that ft G II"^{It), we have that belongs to 
i7™(M), and thus Hq is not empty. Since the space Hq is closed and convex, it contains a 
unique element of minimum norm that we denote by v{ which satisfies equation (14.11) by 
construction of Hq, which completes the proof Theorem 14. 1[ □ 

Proof of Proposition [¥TT] : let t G [0, 1]. Given our assumptions on the Xj's and the T/j's 
we can define the function f "''^ G 7i as the smoothing spline which minimizes the energy 



Po{y) 

vr{y) = { My) 



Pi{y) 
I 



for ye] - 00, ct[, 
for y G [ct,at[, 
for y G [at,bt], 
for y E\bt,dt], 

for y E\dt, +oo[, 



E\{vt) as defined in equation (14. 3p . Then, by definition of v^'^ we have that for any Vt G 7i, 
E\{v^) < E\{vt) which implies that for any v G X Exiv""'^) = /o E{{v]^)dt < Ex{v), which 
proves that v^'^ is a minimum of E\{v). Then, the uniqueness of w'^''^ follows from the 
strict convexity of Ex- 

Now, for 2 = and t G [0, 1], let X* = tyi + (1 — t)xi and % = yi — Xi. Then, 

the time-dependent vector field v^'^ is such that for any a; G M (see Wahba [12]) v^''^{x) = 
a\ + ol\x + YTi=\ Pi^i^^ ^i)^ where the coefficients at = {a{, a!'^' and = . . . , 
are given by: /3, = S^j(/„ - Pt)Y and = (T/E-jTO-^T/E-]Y, with Y = (Yi, . . . , YJ, 
SA,t = Sj + riA/n and Sj is the n x n matrix with elements ^^[i, j] = K{Xj, Xj), Tt is the 

n X 2 matrix with elements Tt[i, 1] = 1 and Tt[i, 2] = X*, and Pt is the nx n matrix given 
by Pt = Tt{Tlf]^\Tt)~^TIT,^\. Then by the continuity of K and the continuity of matrix 
inversion, it follows that the coefficients ctt and f3t are continuous functions of t on [0, 1]. 
Hence, 1 1— > is a continuous function on [0, 1] which implies that \\vt'''^\\'^dt < +oo. 

Hence f "''^ G X"^ which completes the proof of Proposition 14.11 using Theorem 13. 1[ □ 

Proof of Theorem I5.lt in the proof, C will denote a constant whose value may change 
from fine to line. For i = 1, . . . ,n and t G [0, 1], let X* = tf"'{xi) + (1 — t)xi, X* = 
t/(x,) + (1 - t)x^ and Y, = /"( One can remark that the smoothing spline 

f"'^ evaluated at the "design points" X{, . . . ,X^ is a linear function of the observations 

Yi, . . . ,Yn and can therefore be written as v"''^ = "''^(X{), . . . , t>"''^(X*) j = A^.tY, 

where Ax^t = Pt + ^t^xli^n ~ ^t)- Then, under the assumptions of Theorem 15. the 
following lemma holds (the proof follows using arguments in Craven & Wahba 

Lemma 7.2 For almost all t G [0, 1] and any x G M", we have \\Ax^t'^\\2 < 2||x||2 a.s. 
The next step shows that f"''*' is in with an asymptotically probability equals to 1. 
Lemma 7.3 There exists C5 > such that limn^+ooP (y Jq ll'^r''*'!!^'^'^ ^ C'sj = 1. 

Proof: recall that v"''^ is the minimizer of the following energy: 

ExM = /o ^ Eti {Pi^i) - - vt{tr{xi) + (1 - t)xi)y dt + \n jl WhtWldt, where 
Vt{x) = a\ + a\x + ht{x). We will show that f"''^ converges in probability to v-^ for the 
norm \\v\\x2 = WvtW^dt. Let E{v) = Jq (fix) — x — Vt{tf{x) + (1 — t)x))'^ w{x)dxdt, 
^ be a compact set in X"^ and take v e A: {Ex„{v) — E{v))^ < 2_E'^„(i;) + 2_E'|„(f ), where 
E2,n{v) = Xn Jq Whtllxdi and 

Ei^n{v) = / / {f{x)—x — vt{tf{x) + {l — t)x))'^{dwn{x) — w{x)dx)dt + 



^0 

1 rl 



^0 



f nix) - X - Vt{tf nix) + {1 - t)x^ - 

ifix) — X — Vtitfix) + (1 — t)x))'^'\ dwnix)dt, 



with Wnix) = ^ X]r=i '^^■i (^) ' since A„ 0, we have that sup^g^ -E'|„(f ) — > as n — > 
+00. Then, remark that -Ef „(f) < 2/^„(i;) + 2J|„(t>), where h^niv) = JofQifix) — x 
-Vtitfix) + (1 - t)x))^ idwnix) - wix)dx)dt, and hA"^) = Jo if nix) - x 



-vt{tfn{x) + (1 - t)x)) - (fix) -X- vt{tf{x) + (1 - t)x)f dwn{x)dt. Let g^{x) = 

{f{x) - X- vt{tf{x) + (1 - t)x)f, then = g^{x)dt{dwnix) - w{x)dx) by Fu- 

bini theorem. Lemma |7.H imphes that x g^{x)dt is bounded on [0,1] by = 

(1 + ll/lloo + C'2||^t||(2 + ll/lloo))^ c^^- Then, from the compactness of A it follows that 
there exists a constant C such that for all G ^ h,niv) < C j^{dwn{x) —w{x)dx). Hence, 
by definition of w{x) and the inequality above finally implies that 

sup /i ^ as n ^ +oo. (7.1) 
Now, using Cauchy-Schwarz inequality we have that J|_„ < h,niv)Ii^niv), where 

hM = (jn{x)- f{x)-Vt{tfn{x) + {I - t)x) + Vt{t f {x) + (1 - t)x)) ' 

and h,n{v) = Jq Jq (^fn{x) + f{x) - 2x - Vt{tfn{x) + (1 - t)x) - vt{tf{x) + (1 - t)x)^ dwn{x)dt. 
Note that using Lemma [7. 1[ it follows that 

/4,nM <2i?„(A,/)+4 l\\\fU + 2 + 2C2\\vt\\{2+\\f\Ufdt+AClRn{Lf) [' \\vtfdt. 

Jo Jo 

Then, one has that 



hA^) < 2 /" [ (/n(x) - f{x)y dWn{x)dt+2Cl I I \\vtf - fix)^ dWn{x)dt 



dwn{x)dt+2Cl / / 
^0 ^ ' Jo Jo 

and thus sup^g^ /3,„(t;) < 2i?„(/„,/) + 2C|sup^g_4/Q WvtW^dtRnifn, f)- By assumption, 
Rn{fn,f) — > in probability, and thus sup^g_4 J3_„(f ) — in probability as n ^ +oo. 
By combining the above equation with the bound for J4„(f), we finally obtain that 
in probability supj,g_4 /|„(f ) as n — > +oo. So finally, we obtain from (I7.ip that 
sup^g^ ) ^ in probability as n — +oo, which implies together with equation (JTj) 
that supj^g_4 {Ex^{v) — E{v)Y ^ in probability as n ^ +oo. Now, remark that since 
E\^^{v) and E{v) are positive and strictly convex functionals, they have a unique mini- 
mum over the set of time- dependent vector fields X. Moreover, by definition of G 
one has that for any t G [0, 1], v{{tf{x) + (1 — t)x) = f{x) — x which implies that is 
the minimum of E{y). Let e > and define B{v-^,e) = {v E X"^ ; \\v — v^x^ < e}, and 
let di?(f-^,e) = {v ^ X"^ ; \\v — f-^||;t'2 = e} be the frontier of B{y^ ,e). Since is the 
minimum of E{v), there exits 6 > such that for any v G dB{v-^,e) E{v^) < E{v) + 6. 
Obviously, B{v^ ^e) is a compact subset of Af^, and thus sup^g^j-^/^^) \E\^{v) — E{v)\ con- 
verges to zero in probability. This implies that for any a > 0, there exists ni G N such 
that for any n > ni, P (f G dB(v-^,e) ]E\^{v) — E{v) > — |) > 1 — f . Similarly, there 
exists n2 such that for any n > n2, P [E{v^) — E\^{vf) > — g) > 1 — |. Hence, we have 
that for any n > max(?7,i, ^2), P (f G dB{v-l',e) ]E\^{v) > E\^{v^) + g) > 1 — a. This 
implies that except on a set of probability less than a, Ex„ has a local minimum in the 
interior of B{vf,e) which is thus the global minimum f"''*' since Ex^ is strictly convex. 
Hence, we finally have that for any e > then with probability tending to one v"''^ belongs 
to B{v^,e) which implies that f"''^ converges in probability to v-^ for the norm || ■ ||;t>2. 
This proves that there exists a constant C5 (not depending on n) such that as n — > +00, 



f"''^||;f2 < C5) 1, which completes the proof of Lemma [7.31 □ 

Now, since v"''^ G X^, Theorem 13.11 imphes that we can define f^;^ and 0"'^ as the 
solutions respectively at time t = 1 and time t G [0, 1] of the ODE ^ = t>"''^(0t). We shall 
now control the empirical error i?n(/^, /) = ^ Er=i(/n(^i) -/(^^i))^- First, Er=i(/n(^i) " 

fi^^)? = EtMT'ix,) - = ELi (/o - v{{<p{{x,))) dtf. and 

note that for any t E [0, 1] 

n n y „i " 

which implies that (using Cauchy-Schwarz inequality and the fact that t <1) 

i=l ^=l ^ ^ 

+ / ^ci\\v:'^rY,{xt-ct^i{xS) ds+ / 4ci\\v:'YJ2i^A'^^^-4s{x^)Yds. 

Jo ^ ^ Jo 

To bound this sum, we shall use the following Lemma whose proof can be found in Younes 



Lemma 7.4 Consider three continuous and positive functions Cs,'~fs md Ug defined on 
[0, 1] and such that Ut < Ct + J^'jaUsds, then Ut < Ct + /q Cs7se-/^ "'''^'^ds. 

Then, if we apply LemmaOby letting = E^=l(<^^'^(^^)~<^^(^^))^ 7t = 4C|||t;"'^p, 
Ct = 2jXti (<''(^n-^f(0{(^.)))'rf^ + /o4C|||<.^||2Er=i - cpfix,))' ds, we 
obtain that Ut < Ct + Cs'^s^^" "''''^^ds. Now recall that by definition of f{ and 0{, we have 
(j)({xi) = Xf and = f{xi) - Xi=def f{x^). 

Hence, EiLi (^^s'^iX^) — (0{(xj)) j = ||ylA,sY — f II2, since by definition of Ax^s one 

has (v'^'^iX^), <'^(X^)y = Ax,sY, and where f = (^/(xi), . . . , f{xn)J. Similarly, we 

have that EILi (j^i - <P{{^i)) = ■5^||fn-f 111 < ||fn-f where = (^fn{xi), • • • , fn{x, 
and f = (/(xi), . . . , f{xn))'- Hence, 



Q<2 / ||A,,,Y-f||2ds+ /" 4C2||<'^i|2ds||i-f||2. (7.2) 
Jo Jo 

Now, remark that ||Aa,.Y - f ||i < ||(/„ - AA,s)f II2 + Pa,s(Y - f)|||, and observe that by 
Lemma 17.21 

\\Ax,siY -ml<2\\h-m (7.3) 

and let S"''*' be the solution of the following smoothing problem: find v E Ti which 

minimizes ^ Er=i {ji^i) ~ '^{X^)^ + AH/iHl-, where Vt{x) = a\ + a\x + ht{x). Then, by 



definition of f "''^ we have that 



i=l 



Finally, by combing equations (17.31) and the above inequality, we obtain that there exists 
a constant C such that 

||^A,sY-f||2 <C (\%-m + n\^ withC = max(2C2lt;ff,||/i{||x)- (7-4) 

Now using Lemma [7.31 and combining the above relation with equations (17. 2p and (17.41) . 
we finally obtain in probability q < h{n). where h{n) = C ^||f„ — f||2 + nXj for some 

constant C > 0. Similarly, by Lemma (7.31 we have that there exists a constant Cq such 
that (in probability) for any s, t G [0, 1] 'jrdr < Cq. Then, by combining the previous 
inequalities, we derive from Lemma (17. 4p that ut < b{n){l+CQe^^). Now, since Rn{fn^ f) = 
^Ui and Rn{fn, f) = ^||fn ^ f||2) we finally obtain that there exists a constant Ai such 

that with probability tending to one as n — > +oo Rnifn-, f) ^ (^Rn{fn, f) + A„j , which 
completes the proof of Theorem 15. 1[ □ 
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